Methods and Tools of Computational Linguistics for the Classification of Natural Non-referential Ellipsis in Spanish (review)

نویسنده

  • Vera Danilova
چکیده

Vera Danilova Abstract: This article represents a brief survey of the few works, dedicated to the modern approaches of natural language processing (NLP) to the analysis of impersonal sentences in Spanish. Such an analysis consists in classification of non-referential ellipsis that can be used in machine translation systems. The NLP approaches related with Spanish are mainly based on the work of Rello published in 2010. These approaches do not make use of a proper classification of impersonal models, but of a relative descriptive distribution without strict criteria. The structured classification presented in this article, based on historical and semantic data of interlingual nature, can be also applied for creation of linguistically-motivated classes for machine learning methods. The automatic classification method, employed in the work of Rello, is based on the use of the wellknown WEKA package instance-based learner.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elliphant: A Machine Learning Method for Identifying Subject Ellipsis and Impersonal Constructions in Spanish

This thesis presents Elliphant, a machine learning system for classifying Spanish subject ellipsis as either referential or non-referential. Linguistically motivated features are incorporated in a system which performs a ternary classification: verbs with explicit subjects, verbs with omitted but referential subjects (zero pronouns), and verbs with no subject (impersonal constructions). To the ...

متن کامل

A machine learning method for identifying impersonal constructions and zero pronouns in Spanish∗ Un método de aprendizaje automático para la identificación de construcciones impersonales y pronombres cero en español

In this paper, we present a machine learning system for classifying subject ellipsis in Spanish as either referential or non-referential. To the best of our knowledge, this is the first attempt to automatically identify non-referential ellipsis in Spanish. An evaluation of our system against 6,827 finite verbs shows an accuracy of 87%.

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012